上下文匪徒旨在根据其上下文信息在一组最佳奖励的武器中识别最佳奖励。由于武器通常表现出群体行为和群体之间存在相互影响的事实,我们引入了一个新模型,ARM组图(AGG),节点代表武器组和加权边缘组成组之间的相关性。为了利用丰富的信息,我们提出了一种强盗算法,即ag-ucb,在该算法中,神经网络旨在估计奖励,我们建议利用图形神经网络(GNN)来学习具有相关性的ARM组的表示。为了解决匪徒中的剥削 - 探索困境,我们得出了建立在神经网络(剥削)探索的新的上置信度结合(UCB)。此外,我们证明了Agg-UCB可以实现与过度参数化的神经网络结合的近乎最佳的遗憾,并提供GNN的收敛分析,并具有完全连接的层,这可能具有独立的利益。最后,我们对多个公共数据集的最新基准进行了广泛的实验,显示了拟议算法的有效性。
translated by 谷歌翻译
虚假信息是指故意传播的虚假信息以影响公众,而虚假信息对社会的负面影响可以在许多问题(例如政治议程和操纵金融市场)中观察到。在本文中,我们确定了从多个方面的自动虚假信息检测相关的普遍挑战和进步,并提出了一个称为迪斯科的全面和可解释的虚假发现检测框架。它利用了虚假信息的异质性,并解决了预测的不透明性。然后,我们以令人满意的检测准确性和解释为现实世界中的假新闻检测任务提供了迪斯科舞厅的演示。迪斯科的演示视频和源代码现已公开可用。我们希望我们的演示可以为解决整体的识别,理解和解释性的局限性铺平道路。
translated by 谷歌翻译
已经研究了几十年的上下文多武装匪,并适应了各种应用,如在线广告和个性化推荐。为了解决匪徒的开发探索权衡,有三种主要技术:epsilon - 贪婪,汤普森采样(TS)和上置信度(UCB)。在最近的文献中,线性上下窗匪徒采用了脊回归来估计奖励功能,并将其与TS或UCB策略结合起来的探索。但是,这行作品明确假设奖励基于ARM向量的线性函数,在现实世界数据集中可能不是真的。为了克服这一挑战,已经提出了一系列神经基的强盗算法,其中分配了神经网络以学习基础奖励功能,并且TS或UCB适于探索。在本文中,我们提出了一种具有新的探索策略的神经基匪徒方法。除了利用神经网络(开发网络)外学习奖励功能之外,与目前估计的奖励相比,EE-Net采用另一个神经网络(勘探网络)来自适应地学习潜在的增益。然后,构建决策者以将输出与剥削和探索网络组合起来。我们证明了EE-Net实现了$ \ mathcal {o}(\ sqrt {t \ log t})$后悔,它比现有最先进的神经强盗算法更紧密($ \ mathcal {o}(\基于UCB和TS的SQRT {T} \ log t)$。通过对四世界数据集的广泛实验,我们表明EE-Net优于现有的线性和神经匪徒的方法。
translated by 谷歌翻译
我们研究在上下文多臂强盗(MAB)中识别用户簇。上下文mAB是许多真实应用程序的有效工具,例如内容建议和在线广告。实际上,用户依赖性在用户的操作以及奖励中起着至关重要的作用。聚类相似的用户可以提高奖励估计的质量,从而导致更有效的内容建议和有针对性的广告。与传统的聚类设置不同,我们基于未知的匪徒参数聚类用户,该参数将逐步估算。特别是,我们在上下文mAB中定义了群集检测的问题,并提出了一种带有局部聚类过程的Bandit算法,LocB,LocB。而且,我们就聚类的正确性和效率及其遗憾束缚的理论分析提供了有关LICB的理论分析。最后,我们从各个方面评估了提出的算法,这些算法的表现优于最先进的基准。
translated by 谷歌翻译
Optimal transport (OT) has become exceedingly popular in machine learning, data science, and computer vision. The core assumption in the OT problem is the equal total amount of mass in source and target measures, which limits its application. Optimal Partial Transport (OPT) is a recently proposed solution to this limitation. Similar to the OT problem, the computation of OPT relies on solving a linear programming problem (often in high dimensions), which can become computationally prohibitive. In this paper, we propose an efficient algorithm for calculating the OPT problem between two non-negative measures in one dimension. Next, following the idea of sliced OT distances, we utilize slicing to define the sliced OPT distance. Finally, we demonstrate the computational and accuracy benefits of the sliced OPT-based method in various numerical experiments. In particular, we show an application of our proposed Sliced-OPT in noisy point cloud registration.
translated by 谷歌翻译
Human civilization has an increasingly powerful influence on the earth system. Affected by climate change and land-use change, natural disasters such as flooding have been increasing in recent years. Earth observations are an invaluable source for assessing and mitigating negative impacts. Detecting changes from Earth observation data is one way to monitor the possible impact. Effective and reliable Change Detection (CD) methods can help in identifying the risk of disaster events at an early stage. In this work, we propose a novel unsupervised CD method on time series Synthetic Aperture Radar~(SAR) data. Our proposed method is a probabilistic model trained with unsupervised learning techniques, reconstruction, and contrastive learning. The change map is generated with the help of the distribution difference between pre-incident and post-incident data. Our proposed CD model is evaluated on flood detection data. We verified the efficacy of our model on 8 different flood sites, including three recent flood events from Copernicus Emergency Management Services and six from the Sen1Floods11 dataset. Our proposed model achieved an average of 64.53\% Intersection Over Union(IoU) value and 75.43\% F1 score. Our achieved IoU score is approximately 6-27\% and F1 score is approximately 7-22\% better than the compared unsupervised and supervised existing CD methods. The results and extensive discussion presented in the study show the effectiveness of the proposed unsupervised CD method.
translated by 谷歌翻译
Safe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and quantifying the learned model error, which leads to conservative filters before a large amount of data is collected to learn a good model, thereby preventing efficient exploration. This paper presents a method for safe and efficient model-free RL using disturbance observers (DOBs) and control barrier functions (CBFs). Unlike most existing safe RL methods that deal with hard state constraints, our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions. The DOB-based CBF can be used as a safety filter with any model-free RL algorithms by minimally modifying the actions of an RL agent whenever necessary to ensure safety throughout the learning process. Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning, in terms of safety violation rate, and sample and computational efficiency.
translated by 谷歌翻译
预先训练的语言模型在多大程度上了解有关分发性现象的语义知识?在本文中,我们介绍了Distnli,这是一种新的自然语言推理诊断数据集,该数据集针对分布式引起的语义差异,并采用因果中介分析框架来量化模型行为并探索该语义相关任务中的基本机制。我们发现,模型的理解程度与模型大小和词汇大小有关。我们还提供有关模型如何编码这种高级语义知识的见解。
translated by 谷歌翻译
在各种机器学习问题中,包括转移,多任务,连续和元学习在内,衡量不同任务之间的相似性至关重要。最新的测量任务相似性的方法依赖于体系结构:1)依靠预训练的模型,或2)在任务上进行培训网络,并将正向转移用作任务相似性的代理。在本文中,我们利用了最佳运输理论,并定义了一个新颖的任务嵌入监督分类,该分类是模型的,无训练的,并且能够处理(部分)脱节标签集。简而言之,给定带有地面标签的数据集,我们通过多维缩放和串联数据集样品进行嵌入标签,并具有相应的标签嵌入。然后,我们将两个数据集之间的距离定义为其更新样品之间的2-Wasserstein距离。最后,我们利用2-wasserstein嵌入框架将任务嵌入到矢量空间中,在该空间中,嵌入点之间的欧几里得距离近似于任务之间提出的2-wasserstein距离。我们表明,与最佳传输数据集距离(OTDD)等相关方法相比,所提出的嵌入导致任务的比较显着更快。此外,我们通过各种数值实验证明了我们提出的嵌入的有效性,并显示了我们所提出的距离与任务之间的前进和向后转移之间的统计学意义相关性。
translated by 谷歌翻译
在这项研究中,提出了一种半监督的学习(SSL)方法,用于改善双颞图像对检测的城市变化检测。所提出的方法适应了双任务暹罗差异网络,该网络不仅可以通过差分解码器进行预测,而且还可以通过语义解码器进行两种图像的片段建筑物。首先,对体系结构进行了修改,以产生从语义预测得出的第二个更改预测。其次,采用SSL来改善监督的变更检测。对于未标记的数据,我们引入了一种损失,鼓励网络预测两个变化输出之间的一致变化。使用SpaceNet7数据集对所提出的方法进行了有关城市变化检测的测试。与三个完全监督的基准相比,SSL取得了改善的结果。
translated by 谷歌翻译